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We investigate the estimation of the extreme value index when the data are subject to ran- 
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1. Introduction 

Let Xi, . . . , Xn be independent and identically distributed (i.i.d.) random variables, dis- 
tributed according to an unknown distribution function (df) F. A question of great 
interest is how to obtain a good estimator for a quantile 

F^(l-e)=inf{y:F(y)>l-e}, 

where e is so small that this quantile is situated on the border of, or beyond, the range of 
the data. Estimating such extreme quantiles is directly linked to the accurate modeling 
and estimation of the tail of the distribution 

F{x) 1 - F{x) = P{X > x) 

for large thresholds x. From extreme value theory, the behaviour of such extreme quantile 
estimators is known to be governed by one crucial parameter of the underlying distri- 
bution, the extreme value index. This parameter is important since it measures the tail 
heaviness of F. This estimation has been widely studied in the literature: we mention, 
for example. Hill (1975), Smith (1987), Dekkcrs et al. (1989) and Drees et al. (2004). 
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However, in classical applications such as the analysis of lifetime data (survival analysis, 
reliability theory, insurance), a typical feature which appears is censorship. Quite often, 
X represents the time elapsed from the entry of a patient in, say, a follow-up study 
until death. If, at the time that the data collection is performed, the patient is still alive 
or has withdrawn from the study for some reason, the variable of interest X will not 
be available. A convenient way to model this situation is the introduction of a random 
variable independent of X, such that only 

Z^X^Y and 5^1{x<y} (1) 

are observed. The indicator variable 5 determines whether or not X has been censored. 
Given a random sample {Zi,Si), 1 < i < n, oi independent copies of {Z,S), it is our goal 
to make inference on the right tail of the unknown lifetime distribution function F, while 
G, the df of Y , is considered to be a nonparamctric nuisance parameter. 

Statistics of extremes of randomly censored data is a new research field. The statistical 
problems in this field are difficult since, typically, only a small fraction of the data can be 
used for inference in the far tail of F and, in the case of censoring, these data are, more- 
over, not fully informative. The topic was first mentioned in Rciss and Thomas (1997), 
Section 6.1, where an estimator of a positive extreme value index was introduced, but no 
(asymptotic) results were derived. Recently, Beirlant et al. (2007) proposed estimators 
for the general extreme value index and for an extreme quantile. That paper made a start 
on the analysis of the asymptotic properties of some estimators that use the data above 
a deterministic threshold and only under the Hall model. In this paper, we consider the 
"natural" estimators (which are based on the upper order statistics); our methodology 
is much more general and completely different to their approach. 

For almost all applications of extreme value theory, the estimation of the extreme value 
index is of primary importance. Consequently, it is the main aim of this paper to propose 
a unified method to prove asymptotic normality for various estimators of the extreme 
value index under random censoring. We apply our estimators to the problem of extreme 
quantile estimation under censoring. We illustrate our results with simulations and also 
apply our methods to AIDS survival data. 

We consider data on patients diagnosed with AIDS in Australia before 1 July 1991. 
The source of these data is Dr P.J. Solomon and the Australian National Centre in HIV 
Epidemiology and Clinical Research; sec Vcnablcs and Ripley (2002). The information on 
each patient includes gender, date of diagnosis, date of death or end of observation and an 
indicator as to which of the two is the case. The data set contains 2843 patients, of which 
1761 died; the other survival times are right-censored. We will apply our methodology 
to the 2754 male patients (there are only 89 women in the data set), of which 1708 died. 
Apart from assessing the heaviness of the right tail of the survival function 1 — F by 
means of the estimation of the extreme value index, it is also important to estimate very 
high quantiles of F, thus obtaining a good indication of how long very strong men will 
survive AIDS. 

Another possible application, not pursued in this paper, is to annuity insurance con- 
tracts. Life annuities are contractual guarantees, issued by insurance companies, pension 
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plans and government retirement systems, that offer promises to provide periodic income 
over the lifetime of individuals. If we monitor the policyholders during a certain period, 
the data are right censored since many policyholders survive until the end of the obser- 
vation period. We are interested in the far right tail of the future lifetime distribution of 
the annuitants, since longevity is an important and difficult risk to evaluate for insurance 
companies. In the case of life annuities, it needs to be estimated as accurately as possible 
for setting adequate insurance premiums. 

We will study estimators for the extreme value index of F, assuming that F and G are 
both in the max-domain of attraction of an extreme value distribution. In Section 2, we 
introduce various estimators of this extreme value index and we establish, in a unified 
way, their asymptotic behaviors. We also introduce an estimator for very high quantiles. 
Various examples are given in Section 3 and a small simulation study is performed. Our 
estimators are applied to the AIDS data in Section 4. 



2. Estimators and main results 

Let Xi,. . . , Xn be a sequence of i.i.d. random variables from a df F. We denote the order 
statistics by 

The weak convergence of the centered and standardized maxima X„,„ implies the exis- 
tence of sequences of constants a„ > and fe„ and a df G such that 

lim f( ~ ^" < x) - G{x) (2) 

n^oo \ ttn J 

for all X where G is continuous. The work of Fisher and Tippctt (1928), Gnedcnko (1943) 
and de Haan (1970) answered the question on the possible limits and characterized the 
classes of distribution functions F having a certain limit in (2). 

This convergence result is our main assumption. Up to location and scale, the possible 
limiting dfs G in (2) are given by the so-called extreme value distributions G^, defined 

by 

c^(,)roxp(^(i-^ ,.)-/.), if.^o, 

exp(— exp(— x)), if 7 = 0. 

We say that F is in the (max-)domain of attraction of G^, denoting this by F e D{G^). 
Here 7 is the extreme value index. Knowledge of 7 is crucial for estimating the right tail 
of i^. 

We briefly review some estimators of 7 that have been proposed in the literature. The 
most famous is probably the Hill (1975) estimator 



1 ^ 

7S.„ := M^x\,n ■=-j:Y. log^n-^-hl,n " logX„_fc,„, (4) 
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where k G {1, . . . ,n — 1}. However, this esthnator is only useful when 7 > 0. A generaliza- 
tion which works for any 7 € R is the so-called moment estimator, introduced in Dckkers 
et al. (1989): 

Txln ■■= M'xin + Sx,k,n := M)^> + 1 - - 1 - — — , (5) 



2 V M'^' 



with 



1 

- ^(logX„_i+i,„ - logX„_fc,„)^. 



k 

1=1 



The Hill estimator can be derived in several ways, a very appealing one being the slope 
of the Pareto quantile plot, which consists of the points 

n+1 \ 
log— — ,logX„_i+i^„ j, i = l,...,fc. 

This plot has been generalized in Bcirlant et al. (1996) by defining UHin = Xn-i^nlx in 
and considering the points 

log^^^-^,log C/ffi^„^ , i = l,...,k. 

This generalized quantile plot becomes almost linear for small enough fc, that is, for ex- 
treme values. It follows immediately that the slope of this graph will estimate 7 regardless 
of whether it is positive, negative or zero. An estimator of this slope is given by 

1 

^x.kl -lY. log - log UHu+i^r., (6) 

1=1 

where fcg{l,...,?i — 2}. 

A quite different estimator of 7 is the so-called maximum likelihood {ML) estimator. 
(Note that the classical, parametric ML approach is not applicable because F is not 
in a parametric family.) The approach relies on results in Balkema and de Haan (1974) 
and Pickands (1975), stating that the limit distribution of the exceedances Ej ^ Xj — t 
{Xj > t) over a threshold t, when t tends to the right end-point of F, is given by a 
generalized Pareto distribution depending on two parameters, 7 and a. In practice, t is 
replaced by an order statistic Xn-k n and the resulting ML-estimators are denoted by 
^^"^"-^ and a^''"^ 

In the case of censoring, we would like to adapt all of these methods. Actually, we 
will provide a general adaptation of estimators of the extreme value index and a unified 
proof of their asymptotic normality; the four estimators above are special cases of this. 
We assume that both F and G are absolutely continuous and that F G D{Gj^) and 
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G £ D{G^^) for some 71,72 G K. The extreme value index of H , the df of Z defined in 
(1), exists and is denoted by 7. Let Tp = sup{x : F{x) < 1} (resp., tg and th) denote the 
right endpoint of the support of F (resp., G and H). In the sequel, we assume that the 
pair {F,G) is in one of the following three cases: 

7i72 

case 1: 71 > 0,72 > 0, in this case 7 



71 +72 
7172 (T) 

case 2: 71 < 0,72 < 0,rF = Tg, in this case 7 = , v'.* 

71 +72 

. case 3: 71 = 72 = 0, Tp — tq — 00, in this case 7 = 0. 

(In case 3, we also define, for convenient presentation, ^"^y^^ =7 = 0.) The other possi- 
bilities are not very interesting. Typically, they are very close to the "uncensored case" , 
which has been studied in detail in the literature (this holds, in particular, when 71 < 
and 72 > 0) or the "completely censored situation" , where estimation is impossible (this 
holds, in particular, when 71 > and 72 < 0). 

The first important point that should be mentioned is the fact that all of the preceding 
estimators (Hill, moment, UH or ML) are obviously not consistent if they are based on 
the sample Zi, . . . , Z„, that is, if the censoring is not taken into account. Indeed, they 
all converge to 7, the extreme value index of the Z-sample, and not to 71, the extreme 
value index of F . Consequently, we must adapt all of these estimators to censoring. We 
will divide all these estimators by the proportion of non-censored observations in the k 
largest Z's: 

lz,k,n = . where P=t2^ h^-j+iM , 

with , i], ... ,5[n,n] being the (5's corresponding to Zi. „,..., Z„.„, respectively. 7^'j, „ 

could be any estimator not adapted to censoring, in particular, l^zk n^^'^z ^k n-i^'^zkn 

'yz^k^n- follow that p estimates hence Jz\ n estimates 7 divided by z^f^p^, 

which is equal to 71. It is our main aim to study in detail the asymptotic normality of 
these estimators. 

To illustrate the difference between the estimators, adapted and not adapted to cen- 
soring, in Figure 1(a), we plot ^^zkl (dashed line) and Jz'k^n^ (full line) as a function of 
k for the AIDS survival data. We see a quite stable plot when k ranges from about 200 
(or 350) to 1200 and a substantial difference between the two estimators. Similar graphs 
could be presented for the other estimators. 

Let us now consider the estimation of an extreme quantile = F^(l — e). Denot- 
ing by Fn the Kaplan-Meier (1958) product-limit estimator, we can adapt the classical 
estimators proposed in the literature as follows: 



X 



,k — '^n-k,n + az,k,n -(c,-) ' ^ 

7z,fc,n 



212 



J.H.J. Einmahl, A. Fils-Villetard and A. Guillou 



where 



.(c,-) _ ^"-fc,«M|^]. „(1 - Sz,k,n) 



"'Z,k,n ~ 

with Sz,k,n defined in (5) and 



for M and UH , 



a 



{c,ML) 
Z,k,n 



'' V 7 

Z.k.n 
P 



Note tliat these estimators are defined under the assumption that the two endpoints Tp 
and tg are equal, but possibly infinite. This is true for the three cases defined in (7). 
Also, note that we have excluded the Hill estimator since it only works in case 1. 

Again, to observe the difference between the adapted and non-adapted estimators, in 
Figure 1(b), we plot Xq'^^^i j, (dashed line) and x^^'qq^I (full line) for the AIDS data. The 
difference between the two estimators (for k between 250 and 500) is about 10 years. 

Beirlant et al. (2007) considered asymptotic properties of some of these estimators, 
when Zn-k,n is replaced by a deterministic t in the preceding formulas and only under 
the Hall model. Also, note that the asymptotic bias of these estimators has not been 
studied. Our aim in this paper is to establish the asymptotic normality (including bias 
and variance) of all of the above estimators of the extreme value index (based on fc + 1 
upper order statistics or, equivalently, on a random threshold Zn-k.n)- We use a general 
approach that separates extreme value theory and censoring. Therefore, in the proof, we 
can treat the above four estimators (and others) simultaneously. 

To specify the asymptotic bias of the different estimators, we use a second-order con- 
dition phrased in terms of the tail quantile function Uh{x) = H^{1 — ^). From the the- 
ory of generalized regular variation of second-order outlined in de Haan and Stadtmiiller 




1200 



(a) 




1200 



(b) 



Figure 1. [///-estimator adapted (full line) and not adapted (dashed line) to censoring (a) for 
the extreme value index and (b) for the extreme quantile with e = 0.001 for the AIDS survival 
data. 
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(1996), we assume the existence of a positive function a and a second eventually positive 
function 02 with limj^^co 02 (a^) = 0, such that the limit 

a:->oo a2(xj a{x) J 

exists for u £ (0, 00), with h^{u) = /" z^^^ dz. It follows that there exists a c G K and a 
second-order parameter p < for which the function a satisfies 



lim 



''S:i^^u'<\/a2{x)=cu'<hp{u). (10) 
a(x) I / 



The function a2 is regularly varying with index p. As usual, we will assume that p < 
and we will also assume that the slowly varying part of 02 is asymptotically equivalent 
to a positive constant, which can and will always be taken equal to 1. For an appropriate 
choice of the function a, the function k that appears in (9) admits the representation 

k{u)^ Ah^+p{u), (11) 

with A 7^ 0; c in (10) is now equal to 0. We denote the class of second-order regularly 
varying functions Uh (satisfying (9)-(ll) with c = 0) by Gi?V2(7, p; a(x), a2(x); A). 

From Vanroelen (2003), we obtain the following representations of Uh (see also the 
Appendix in Draisma et al. (1999)): 

• 0<-p<7: for t/^ G Gi?F2(7, p; ^+a;T, 02(2;); A), 

UH{x)=£+x''\- + ^-a2{x){l + o{l))]; 

• 7 = _p: for UHeGRV2{-/,--/;e+x^,x-^e2{x);A), 

UHix)^i+x^\-+x-''L2{x) 



.7 

with L2{x) = B + J^{A + o(l))^^2l£) _|_ o{£2{x)) for some constant B and some slowly 
varying function £2] 

• 0<7<-p: for Uh e GRV2i-f, p-J+x^^ ,a2ix); A), 

Uh{x) = £+x''l-+ Dx--^ + -^a2{x){l + o(l)) 
l7 1 + P 

(so D = j^\\mx^ao{UH{x) - a(x)/7}); 

• 7 = 0: for Uh &GRV2{Q,p;£+,a2{x);A), 

AO 

Uh{x) = 1+ logx + D+ — {x){l+ oil))- 
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. 7<0: for UH(^GRV2{i,p](.+x\a2{x)\A), 

1 A 



-7 7 + P 



a2(x)(l + o(l)) 



where £+ > 0, A 7^ 0, D e M. 

In the statement of our resuhs, we use the foUowing notation, similar to that used in 
Beirlant et al. (2005): 



f Ap[p + 7(l-p)] 
(7 + p)(l-p) 



02(3;), 



7 



b{x) = < 



(I + 7) 
73L> 

"(1 + 7)' 
1 



a; ^L2{x), 



log^ a; ' 

Ml -7) 

(l-7-p) 
1_U 

7 



02(2;), 



1-27 



^(1-7)- — 

TH 



if < -p < 7 or if < 7 < -p with D = 0, 
if 7 = -p, 

if < 7 < -p with D 7^ 0, 
if 7 = 0, 
if 7 < p, 
if p < 7 < 0, 
if 7 = p 



and 



r -7, if < 7 < -p with D^O, 
p = < p, if —p < 7 or if < 7 < — p with D = 0, or if 7 < p, 

I 7, if p < 7 < 0. 

Before stating our main result, define 

p{z)^¥{5 = l\Z^z). 

It follows that 

p{z) = 



(l-G(z))/(z) 



{l^Giz))fiz) + {l-F{z))g{zy 



(12) 



where / and g denote the densities of F and G, respectively. Note that, in cases 1 and 
2, limz^^^p(z) exists and is equal to ^J^^^ =: p G (0,1). Assume that, in case 3, this 
limit also exists and is positive and again denote it by p. By convention, we also define 

— — = P for that case. 

71+72 ' 

In the sequel, k = fc„ is an intermediate sequence, that is, a sequence such that fc — > 00 
and - ^ 0, as n — > (X). Our main result now reads as follows. 
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Theorem 1. Under the assumptions that, for n — > oo, 

for the ML-estimator, 



Vka2 I - ) ^ ai e . 



Vkb 



for the other three estimators, 



(13) 



plH^ll-- -p 



• a2 e 



(14) 



and 



Vk sup \p{H'-'{t))^p{H^{s))\ — >0 for all C > 0, {15) 

{l-k/n<f<l,\t-s\<CVk/n,s<l} 

we have, for the four estimators (for the Hill estimator, we assume case 1 holds and for 
the ML-estimator, that 7 > — ij, 

^KTzA - 71) — ^ A/ ( -(ai&o - 71 "2), ^5 I , 

where aibo (resp., a^) denotes the bias (resp., the variance) 0/ A/fc(72 ^ „ ~ 7)- 

This leads to the following corollary, the proof of which is rather straightforward. For 
the Hill estimator, the asymptotic bias-term follows easily from direct computations and 
for the other three estimators, it follows from the expressions for the asymptotic bias- 
terms of the corresponding "unccnsored" estimators: see Beirlant et al. (2005) and Drees 
et al. (2004). 

Corollary 1. Under the assumptions of Theorem 1, we have 
^(7jS-7i) 

-^N(^^i^^^"\^^ in easel- 

Vfc(7Kl^-7i) 



^(1+717)' 

7i^(l-7)'(l-27)(l-7 + 67^) 
72(l-47)(l-37) 



V^^{^Ji^^^^'\p~^), 



in case 1, 

in case 2, 
in case 3; 
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'A^f^f^^^^), 4(1+717) 



in case 1, 

in case 2, 
in case 3; 



"^Hlzkn ^7l) 

-^7V(/^('=-^^^\_p"2[l+7(2 + 7i)]), meases 1, 3 and 2 with j > 



where 



,ic,H) 



71(12 Cti 



7 



7ia2 



P 



ai 
P 



1 



,{c,UH) 



Ac, ML) 



71(12 



27-1 

1-27 A(l-7)^-(7+l)^ 

(l-7)(l-37) Ail-^)-^ 
1-27 

l-27-p' 
1 1, 

ai 



in case 1, 

in case 2, if p < 7, 

in case 2, if p ~ 7, 

in case 2, if ^ < p, 
in case 3: 



P P(l-P)' 
71^2 , ai p{'j + l)A 



P P (1 -p)(l-P + 7) 
Proof of Theorem 1. We consider the following decomposition 

^i^zin ~ 71) = i^(7z,fc,n - 7) + ^^/fc(7 - HP) 

V ' ' p V 71 + 72 



(16) 



The asymptotic behavior of Vk{'yz\ n ~ 7) weW known since this estimator is based on 
the Z-samplc, that is, on the uncensored situation; see Beirlant et al. (2005) and Drees 
et al. (2004). 
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First, note that in case 3, 71 = 7 = 0. Therefore, the second term in the decomposition 

(16) is exactly provided j3> 0. That means that this case follows, since p — > p > 0. We 
now focus in detail on the second term of the decomposition in (16) for the cases 1 and 
2. 

To this end, consider the following construction. Let Z he a random variable with df 
H. Let U have a uniform(0, 1) distribution and be independent of Z . Define 

if C/<p(Z), 
if C/ > p{Z) 



iiU<p, 
if [/ > p. 

We repeat this construction independently n times. It is easy to show that the resulting 
pairs (Zi,Si), i~ l,...,n, have the same distribution as the initial pairs {Zi,Si), i = 
1, . . . , 71, for all 71 G N, so we continue with the new pairs (Z^, (5, ). 
Moreover, Z and 6 are clearly independent and satisfy 

V{\S-6\ = l\Z = z) = \p~p{z)\. 

Consider the order statistics Zi^n < • • • < ^)i,n and denote the induced order statistics of 
the U's by J7[i,„], . . . , U[n,n]- We can write p as follows: 

1 

P=k'^ l{(7[„-, + i,„j<p(Z„_, + i.„)} 

and, similarly, 

1 ~ 1 

Clearly, C/[i,,i], . . . , C^[n,n] are i.i.d. and independent of the Z-sample. 
We use the following decomposition: 

Vk{p — p) = Vk{p — p) + \/k{p — p) . (17) 
Since p = ^ '^{Ui<p} , we have 

\fk(p-p)^M{Q,p{l~p)). 

Now, we are interested in ^/k{p^p), which turns out to be a bias term. It can be rewritten 
as follows: 



and 
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k 



j = 



1 ^ 



Ti^k + T2.k- 

Under the assumptions (14) and (15), the convergence in probability of T2^k to a2 then 

foUows from a resuh in Chow and Teicher (1997), page 356. 

p 

So we now need to show that Ti jt — > 0. To this end, write Vi = H{Zi) so that Zi = 
H^{y{). The V^ are i.i.d. uniform (0,1). Also, write r{t) =p{H^{t)). Then 

1 ^ 

By the weak convergence of the uniform tail quantile process, we have, uniformly in 
1 < j < fc, 



] 



Let 77 > 0. Using (15), we have, with arbitrarily high probability, for large 71, 



+ - Hu,<r(l-j/n)}\ 



1 

.7=1 



)-r(l-j7n)|} 



1 



Using the aforementioned result in Chow and Teicher (1997), page 356, and the fact that 

F 

?7 > can be chosen arbitrarily small, Ti^k — ^ follows. 
Finally, combining (16) and (17) yields 

^(T^i -71) = \{^{i^^k,^-l)-l^^{p-p)) ^ +«P(1), (18) 

with the two terms within the brackets independent since the first is based on the Z- 
sample and the second on the ?7-sample. Therefore, under the assumptions (13)-(15), we 
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have 



^i^z,k]n - 7i) A/'Q(ai6o - 71 "2), 



p2 



□ 



3. Examples and small simulation study 

In this section, we consider three examples: first, a Burr distribution censored by another 
Burr distribution (hence an example of case 1), second a reverse Burr distribution cen- 
sored by another reverse Burr distribution (an example of case 2) and finally a logistic 
distribution censored by a logistic distribution (case 3). We show that these distributions 
satisfy all of the assumptions and calculate the bias terms explicitly. In particular, we will 
see how assumptions (13) and (14) compare. We also provide simulations to illustrate 
the behavior of our estimators for these distributions. 

Example 1. X ~ Burr(/3i, n, Ai) and y ~ Burr(/32, T2, A2), n, Ai, /32, r2, A2 > 0. 
In that case, 



1 - F{x) = 



1 - G{x) 



= x"^i^i/?^(l + /3ix-^i)"^\ x>0; 

J2+X^^ J 
^X-^-'^^-f]^'{l+P2X-^-T^\ X>0. 

We can infer that 

Uh{x)=H^(^1-^ 

= (/3^/32^^x)l/(-^^^+-=^^)[l-7,?(/3^/32^^a;)''(l + o(l))], 

with 

T==min(Ti,r2), P^~1t 



and 



r Al/3l, if Tl < T2, 

r]=l A2/?2, if Tl > T2, 

I Al/3l + A2/32, ifTl=T2. 
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The parameters of interest are 

1 



71 



AlTl ' 



72 = 



A2T2 



and 7 = 



1 



AlTl + A2T2 



First, we check assumption (15). Using the above approximation of i?^, it foUows. for 
s<t <1 and s large enough, that 

\p{H-{t)) p{H-{s))\ < c{{i - sr - (1 - tr) 

for some C > 0. It now easily follows that in the case 7T > 1, the left-hand side of 
(15) tends to 0. In the case 7T < 1, the left-hand side of (15) is of order \/k{ — )^'^ = 
Vk{j)''k''^^, which tends to when (14) holds (see below). 

The asymptotic bias of Vk{j^^\ n ~ 7) '^^^ explicitly computed (from Corollary 1) 
and is asymptotically equivalent to 



( IP 

p(l+7)(7 + p) 
(l-p)(l-p + 7)' 
P[P + 7(1 -P)] 
(X-pf ' 



for the Hill estimator, 
for the ML-estimator, 
for the moment and [/77-cstimators. 



They are all of the same order. 

We obtain another bias term from assumption (14). Direct computations, using (12) 



and -p : 



72 



71+72 



lead to 



p{z)-p-- 



7 



(1 + 0(1)) + f32Z-^Hl + o(l))] 



7172 

when Ti 7^ T2, or ri = T2 and /3i 7^ /32- Consequently, assumption (14) is equivalent to 

,.2 1 /.\P 



7172 1 - p V 



Q!2, 



with 



r-/3i, 

f3=< /32, 



if Tl < T2, 
if Tl > T2, 

/32-/3i, ifri=r2. 



So both bias terms are of the same order. Only when ti = T2 and /3i = (^2 (in particular, 
when F = G) the biases of the estimators of 7 dominate. 



Example 2. X ^ reverse Burr(/3i, ri, Ai, 2:+) and Y ~ reverse Burr(/32,T2, A2,x+), 
/3i,ri,Ai,/32,T2, A2,a;+ > 0. 
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In that case, 

1 - F{x) = 

1-G{x) = 



/9i 



f3i + {x+-x)-"-\ 

{x+ - xY'^' Pi' {i + (ii{x+-xY'y 



l32 + {x+^x)-^\ 

{x+ - x^^^p^^ (1 + /?2(x+ - xpr^\ 



X < X+] 



X <x+. 



Define r and rj as in Example 1, but now set p ~ jt. We can infer that 



1 



Unix) ^H^\^l--j^ x+ - if3t'f3^'x) 

The parameters of interest are 
1 1 



\-l/(riAi+r2A2)r 



7i 



72 = 



A2T2 



AlTl + A2T2 



and Tp = tg = th = X-^.. 



Note that we can easily prove (as in Example 1) that assumption (15) is satisfied if we 
assume (14). 

The asymptotic bias of '\/^(7^ ^ „ ~ 7) can be explicitly computed (again, from Corol- 
lary 1) and is asymptotically equivalent to 

• for the f/iJ-estimator: 



7Ml-7)(l + T) 
(1 - 7-7r)(l - 7r) 
7^ ,a\,aX 



(l-7)(l-27) 
7' 



-277(1-7) + — 



(l-7)(l-27)a;+ 

• for the moment estimator: 

( 7V(l-7)(l + r)(l-27) 
(1 — 7 — 7t)(1 — 27 — 7r) 



ri{f3l'f3^'yVk 



7 



(l-7)(l-37) 



2,7(1-7)'- 



n 
k 

7 + 1 

x+ 



Vk 



for the ML-estimator, if 7 > — ^ 



if T< 1, 
if T = 1, 
if r > 1; 

if r< 1, 
if r = l, 
if r> 1; 



7^(1 +7)(1 + t) 
(1 — 7r)(l + 7 — 7t) 
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They are all of the same order if r < 1, otherwise the biases of the moment and UH- 
estimators dominate that of the ML-estimator. 

Similarly to Example 1 , if ri 7^ T2 , or ti = T2 and Pi ^ (^2, direct computations lead to 

p{z) -p= - ZY'{1 + 0(1)) + P2{X+ - (1 + 0(1))]. 

71 72 

Consequently, assumption (14) is equivalent, in that case, to 

7172 1-p \k J 

Again, this order is the same as the order of the asymptotic bias terms of all of the 
estimators in case r < 1 and dominated by the one of the moment and [/77-cstimators 
otherwise. When ti = T2 and j3i= (32, the biases of the estimators of 7 dominate. 

Example 3. X^Y ^ logistic. 
In that case, 

1 - F(x) = 1 - Gix) = ^ , a;>0. 
^ ' ^ ' 1 + e^ 

Hence, 

UH{x)=\og{2^-l). 

We have 71 = 72 = 7 = 0. Since F = G, we immediately obtain p{-) = ^ and a2 = 0. 

According to Corollary 1, the asymptotic bias of V^(7z fc n ~ asymptotically 
equivalent to jpg/^/fc for the UH- or the moment estimator and to ^ for the ML- 
cstimator. 

In order to illustrate these three examples, we simulate 100 samples of size 500 from 
the following distributions: 

• a Burr(10,4, 1) censored by a Burr(10, 1,0.5); 

• a reverse Burr(l, 8, 0.5, 10) censored by a reverse Burr(10, 1, 0.5, 10); 

• a logistic censored by a logistic. 

For the first two examples, p = |, meaning that the percentage of censoring in the right 
tail is close to 11%. In the last example, p{-) = p= ^, that is, the percentage of censoring 
is as high as 50%. In the first case, we have 7i = 1)7 = | and p = — |, in the second 
case 71 = — 1:,7 = — | and, again, p = — |. In the third example, 71 = 7 = 0. In all three 
examples, panels (a) and (c) (in Figures 2-4) represent the median for the index and the 
extreme quantile, respectively, whereas panels (b) and (d) represent the empirical mean 
square errors (MSE) based on the 100 samples. The small value of e is All of these 
plotted estimators are adapted to censoring. The horizontal line represents the true value 
of the parameter. 
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In the first example, we can observe, in the case of the estimation of the index, the 
superiority of the Hill estimator adapted to censoring in terms of MSE, the three others 
being quite similar. For the extreme quantile estimators, however, there is much less to 
decide between all of the estimators: they are very stable and close to the true value of 
the parameter. A similar observation can be made for the second and third examples, 
with a slight advantage for the t/ff -estimator, only in the case of the estimation of the 
index. 




H) too tsc 200 2sa a so too tso zoo zso 




50 100 150 200 250 50 10O 150 2O0 250 

X k 

(c) (d) 

Figure 2. A Burr(10, 4, 1) distribution censored by a Burr(10, 1,0.5) distribution: f/iJ-estimator 
(dotted line), moment estimator (full line). Mi-estimator (dashed line) and Hill estimator 
(dashed-dotted line); (a) median and (b) MSE for the extreme value index; (c) median and 
(d) MSE for the extreme quantile with e = ^ . 
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Figure 3. A reverse Burr(l, 8, 0.5, 10) distribution censored by a reverse Burr(10, 1, 0.5, 10) 
distribution: [///-estimator (dotted line), moment estimator (full line), Mi-estimator (dashed 
line); (a) median and (b) MSE for the extreme value index; (c) median and (d) MSE for the 
extreme quantile with e = ^ . 



4. Application to AIDS survival data 

We return to our real data set presented in Section 1 and used in Section 2, that is, the 
Australian AIDS survival data for the male patients diagnosed before 1 July 1991. The 
sample size is 2754. 

First, we estimate p = limj,—,,-^ p{z)- In Figure 5, we see p as a function of k. Clearly, 
there is a stable part in the plot when k ranges from about 75 to 175; for higher k, 
the bias sets in. Note that p is the mean of 0-1 variables, so for a sample of this size, 
the estimator is already very accurate. Therefore, we estimate p with the corresponding 
vertical level in the plot, which is 0.28. 
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(a) 



(b) 




Figure 4. A logistic distribution censored by a logistic distribution: [//f-estimator (dotted 
line), moment estimator (full line), Mi-estimator (dashed line); (a) median and (b) MSE for 
the extreme value index; (c) median and (d) MSE for the extreme quantile with e = -i . 



We now continue with the estimation of the extreme value index 71 and an extreme 
quantile _F^(1 — e), using the UH-method (as in Section 2). We will again plot these 
estimators as functions of k, but already replacing p~p{k) with its estimate 0.28 in 
order to prevent that the bias plays a dominant role for values of k larger than 200, say. 

In Figure 6(a), the estimator of the extreme value index is presented, whereas Figure 
6(b) shows the extreme quantile estimator for e = 0.001. The estimator of 71 is quite 
stable for values of k between 200 and 300. We estimate it with 0.14. We estimate the 
extreme quantile with k values in the same range because that range again gives a stable 
part in the plot. The corresponding estimated survival time is as high as about 25 years. 
So, although the estimated median survival time has the low value 1.3 years, we find that 
exceptionally strong males can survive AIDS for 25 years. 
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Figure 5. Estimator of p for tlie Australian AIDS survival data for the male patients. 




(a) (b) 

Figure 6. [///-estimator (a) for the extreme value index and (b) for the extreme quantile with 
£ = 0.001, for the Australian AIDS survival data for the male patients. 
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